### Maximum - speed implementation of ORCA

2019.08.09 Fri

Choi, Geunseok Undergraduate Student CAD & SoC Design Lab. Department of Electrical Engineering Pohang University of Science and Technology

### Maximum-speed implementation of ORCA

Objective : Minmize { Design Area \* Minimum Clock Period }



### **The Conventional**

Worst Negative Slack = 1.72 ns

**Specified Clock Period = 8 ns** 

Minimum clock period = 9.72 ns

Design Area =  $629209.177001 \text{ nm}^2$ 

Voltage Supply = 1.32V

Score (Design Area \* Minimum Clock Period)
= 6115913.2 # of DRC Error = 10

### 2. Structure

✓ Start Back-End with design(.v), constraints/targets(.sdc), conditions/modes(MMMC), libraries(.lef)



POSTECH

# **3. Flow – overall**

✓ Floor plan/Power plan  $\rightarrow$  placement  $\rightarrow$  Clock Tree synthesis  $\rightarrow$  Routing





# 3. Flow - Data setup

### 1) Design library/data

2) Design optimization

### 3) Check over-constraint

open\_mw\_cel -library orca\_lib.mw orca\_setup

set\_tlu\_plus\_files -max\_tluplus \$tlup\_max -min\_tluplus \$tlup\_min -tech2itf\_map \$tlup\_map

set\_host\_options -max\_cores 8

#### set\_max\_area 0

Path Group: INPUTS

| Path Type: max                                      |          |   |       |   |
|-----------------------------------------------------|----------|---|-------|---|
| Point                                               | Incr     |   | Path  |   |
| clock SDRAM_CLK (fall edge)                         | 3.75     |   | 3.75  |   |
| clock network delay (ideal)                         | 0.00     |   | 3.75  |   |
| input external delay                                | 0.80     |   | 4.55  | r |
| sd_DQ[0] (inout)                                    | 0.00     | z | 4.55  | r |
| sdram_DQ_iopad_0/PAD (pc3b05)                       | 0.00     | z | 4.55  | r |
| sdram_DQ_iopad_0/CIN (pc3b05)                       | 0.93     | z | 5.48  | r |
| I_ORCA_TOP/sd_DQ_in[0] (ORCA_TOP)                   | 0.00     | z | 5.48  | r |
| I_ORCA_TOP/I_SDRAM_TOP/sd_DQ_in[0] (SDRAM_TOP)      | 0.00     | z | 5.48  | r |
| I_ORCA_TOP/I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_in[0] (SDRA | M_IF)    |   |       |   |
|                                                     | 0.00     | z | 5.48  | r |
| I_ORCA_TOP/I_SDRAM_TOP/I_SDRAM_IF/DQ_in_0_reg_0_/D  | (sdnrq1) |   |       |   |
|                                                     | 0.00     | z | 5.48  | r |
| data arrival time                                   |          |   | 5.48  |   |
|                                                     |          |   |       |   |
| clock SDRAM_CLK (rise edge)                         | 7.50     |   | 7.50  |   |
| clock network delay (ideal)                         | 0.00     |   | 7.50  |   |
| clock uncertainty                                   | -0.10    |   | 7.40  |   |
| I_ORCA_TOP/I_SDRAM_TOP/I_SDRAM_IF/DQ_in_0_reg_0_/CF | (sdnrq1) |   |       |   |
|                                                     | 0.00     |   | 7.40  | r |
| library setup time                                  | -0.22    |   | 7.18  |   |
| data required time                                  |          |   | /.18  |   |
| data required time                                  |          |   |       |   |
| data arrival time                                   |          |   | -5 48 |   |
|                                                     |          |   |       |   |
| slack (MET)                                         |          |   | 1.70  |   |
| · · · ·                                             |          |   |       |   |
|                                                     |          |   |       |   |

### zero-interconnect timing checking

set\_zero\_interconnect\_delay\_mode true

report\_constraint -all

report\_timing

set\_zero\_interconnect\_delay\_mode false



## 3. Flow – Floorplan/Powerplan

### 1) Create P/G pads and conncet

source -echo scripts/pad\_cell\_cons.tcl

```
create_floorplan -core_utilization 0.5 -left_io2core 10.0 -bottom_io2core 10.0 -right_io2core 10.0 -top_io2core 10.0
```

insert\_pad\_filler -cell "pfeed10000 pfeed05000 pfeed02000 pfeed01000 pfeed00500 pfeed00200 pfeed00100 pfeed00050 pfeed00010 pfeed00005"

source -echo scripts/connect\_pg.tcl

create\_pad\_rings

save\_mw\_cel -as floorplan\_init

### 2) Macro cell placement

source -echo scripts/preplace\_macros.tcl

```
set_fp_placement_strategy -auto_grouping high -macros_on_edge on -sliver_size 10 -virtual_IPO on
```

set\_fp\_macro\_options -legal\_orientation {W E} [get\_cells I\_ORCA\_TOP/I\_PCI\_TOP/I\_PCI\_WRITE\_FIF0/PCI\_FIF0\_RAM\_\*]

create\_fp\_placement -timing\_driven -no\_hierarchy\_gravity

set\_keepout\_margin -type hard -all\_macros -outer {10 10 10 10}

create\_fp\_placement -timing\_driven -no\_hierarchy\_gravity

set\_dont\_touch\_placement [all\_macro\_cells]



## 3. Flow – Floorplan/Powerplan

### 1) Check congestion – GRCs is less than 0.1%

phase1. Routing result: phase1. Both Dirs: Overflow = 171 Max = 2 GRCs = 316 0.03%) phase1. H routing: Overflow = 28) GRCS =73 Max = 1 (GRCs = 232 (0.05%) phase1. V routing: Overflow = 98 Max = 2 (GRCs = 16) GRCs =84 (0.02%) phase1. METAL Overflow = 32 Max = 0 (GRCs = 180) GRCs =180 (0.04%) Overflow = 98 Max = 2 (GRCs = 16) GRCs =84 (0.02%) phase1. METAL2 Overflow = 40 Max = 1 (GRCs = 28) GRCs = phase1. METAL3 52 (0.01%) Overflow = 0) GRCs = 0 (0.00%) phase1. METAL4 0 Max = 0 (GRCs =phase1. METAL5 Overflow = 0 Max = 0 (GRCs =0) GRCs = 0 (0.00%) phase1. METAL6 Overflow = 0 Max = 0 (GRCs =0) GRCs = 0 (0.00%)

2) Check IR-drop – left : conventional / right : optimized

analyze\_fp\_rail -nets {VDD VSS} -voltage\_supply 1.32 -power\_budget 350 -pad\_masters { pv0i pvdi }





create\_fp\_placement -timing\_driven -no\_hierarchy\_gravity

route\_zrt\_global -congestion\_map\_only true -exploration true

### **3. Flow – Placement**

| 1) Placement           | create_placement_blockage -coordinate {{345.415 345.145} {2141.135 367.190}} -name placement_blockage_3 -type hard |
|------------------------|--------------------------------------------------------------------------------------------------------------------|
|                        | place_opt -area_recovery -effort high -power -congestion                                                           |
|                        |                                                                                                                    |
| 2) Congestion analysis | route_zrt_global -cong true -exploration true                                                                      |
| 2) Timina analysia     | www.entersint.www.deley.ell.wieleters.commic.fell.estive.commic.l                                                  |
| 3) Timing analysis     | report_constraint -max_delay -all_violators -scenario [all_active_scenarios]                                       |
| 4) optimization        | psynopt -area_recovery -power -congestion                                                                          |
|                        |                                                                                                                    |
| *****                  | *****                                                                                                              |

| Report : constraint          |                       |                    |                          |                     |        |     |        |        |      |     |    |         |
|------------------------------|-----------------------|--------------------|--------------------------|---------------------|--------|-----|--------|--------|------|-----|----|---------|
| -all_violators<br>-max_delay |                       | phase1.<br>phase1. | Routing re<br>Both Dirs: | sult:<br>Overflow = | 26 Max | = 1 | GRCs = | 128    | (0.0 | 1%) |    |         |
| Version: 0-2018 06           |                       | phase1.            | H routing:               | Overflow =          | 26 Max | = 1 | (GRCs  | = 12)  | GRCS | = 1 | 28 | (0.03%) |
| Date : Thu Aug 8 15:33:2     | 0 2019                | phase1.            | V routing:               | Overflow =          | 0 Max  | = 0 | (GRCs  | = 0)   | GRCs | =   | 0  | (0.00%) |
| *****                        | ****                  | phase1.            | METAL                    | Overflow =          | 13 Max | = 0 | (GRCs  | = 104) | GRCs | = 1 | 04 | (0.02%) |
|                              |                       | phase1.            | METAL2                   | Overflow =          | 0 Max  | = 0 | (GRCs  | = 0)   | GRCs | =   | 0  | (0.00%) |
| Parasitic source             | : LPE                 | phase1.            | METAL3                   | Overflow =          | 4 Max  | = 1 | (GRCs  | = 4)   | GRCs | =   | 12 | (0.00%) |
| Parasitic mode               | : RealRVirtualC       | phase1.            | METAL4                   | Overflow =          | 0 Max  | = 0 | (GRCs  | = 0)   | GRCs | =   | 0  | (0.00%) |
| Extraction mode              | : MIN_MAX             | phase1.            | METAL5                   | Overflow =          | 8 Max  | = 1 | (GRCs  | = 8)   | GRCs | =   | 12 | (0.00%) |
| Extraction derating          | <u>: 125/125/12</u> 5 | phase1.            | METAL6                   | Overflow =          | 0 Max  | = 0 | (GRCs  | = 0)   | GRCs | =   | 0  | (0.00%) |
| This design has no violated  | constraints.          |                    |                          |                     |        |     |        |        |      |     |    |         |



# **3. Algorithm – CTS**

| 1) CTS design setting | remove_ideal_network [all_fanout -flat -clock_tree]                        |                                 |               |                 |             |                         |   |  |  |
|-----------------------|----------------------------------------------------------------------------|---------------------------------|---------------|-----------------|-------------|-------------------------|---|--|--|
|                       | set_delay_calculation_options -routed_clock arnoldi                        |                                 |               |                 |             |                         |   |  |  |
|                       | <pre>set_clock_tree_options -target_skew 0.2</pre>                         |                                 |               |                 |             |                         |   |  |  |
|                       | <pre>set_clock_uncertainty 0.2 [all_clocks]</pre>                          |                                 |               |                 |             |                         |   |  |  |
|                       | set_clock_tree_references -references {bufbd1 bufbd2 bufbd4 bufbd7 bufbdf} |                                 |               |                 |             |                         |   |  |  |
|                       | define_routing_rule CL                                                     | OCK_DOUBLE_SPA                  | CING -spacing | gs {METAL3 0.42 | METAL4 0.63 | METAL5 0.82}            |   |  |  |
|                       |                                                                            |                                 |               |                 |             |                         |   |  |  |
|                       | clock opt -only cts -n                                                     | o clock route                   | -inter cloc   | ck balance      |             |                         |   |  |  |
| 2) CTS implement      |                                                                            |                                 | _             |                 |             |                         |   |  |  |
|                       | extract_rc                                                                 |                                 |               |                 |             |                         |   |  |  |
|                       | clock ont -only nevn -                                                     | no clock rout                   | e -power -ar  | caa recovery    |             |                         |   |  |  |
|                       | clock_opt only_psyn                                                        | no_crock_rout                   |               | led_recovery    |             |                         |   |  |  |
|                       |                                                                            | 01.00                           |               |                 |             |                         |   |  |  |
|                       |                                                                            | Sinka                           | CTPuffo       | mary =======    |             | =======<br>  ongootDoth |   |  |  |
| 1) Clock tree skew    | C10Ck                                                                      | 5111KS                          |               |                 |             |                         |   |  |  |
|                       | pclk                                                                       | 623                             | 28            | 32              | 0.3525      | 1.4725                  | 0 |  |  |
|                       | sdr_clk                                                                    | 2138                            | 97            | 102             | 0.1851      | 1.5182                  | 0 |  |  |
|                       | PCI_CLK                                                                    | 621                             | 21            | 23              | 0.0349      | 0.5452                  | 0 |  |  |
|                       | SDRAM_CLK                                                                  | 2137                            | 90            | 93              | 0.4598      | 0.8920                  | 0 |  |  |
|                       | SD_DDR_CLK                                                                 | 0                               | 0             | 0               | 0.0000      | 0.0000                  | 0 |  |  |
|                       | Global Settings fo                                                         | Global Settings for clock trees |               |                 |             |                         |   |  |  |

\_\_\_\_\_\_



# **3. Algorithm – CTS**

#### 2) Clock tree skew

Path Group: SYS\_CLK Path Type: max

| Point                                                | Incr    | Path   |
|------------------------------------------------------|---------|--------|
| clock SYS_CLK (rise edge)                            | 0.00    | 0.00   |
| clock network delay (propagated)                     | 1.85    | 1.85   |
| I_ORCA_TOP/I_BLENDER_3/s3_op1_reg_0_/CP (sdnrq1)     | 0.00    | 1.85 r |
| I_ORCA_TOP/I_BLENDER_3/s3_op1_reg_0_/Q (sdnrq1)      | 0.39    | 2.24 r |
| I_ORCA_TOP/I_BLENDER_3/sub_171/B[0] (BLENDER_5_DW01_ | _sub_0) |        |
|                                                      | 0.00    | 2.24 r |
| •                                                    |         |        |
| •                                                    |         |        |
| •                                                    |         |        |
|                                                      |         |        |
| clock SYS_CLK (rise edge)                            | 8.00    | 8.00   |
| clock network delay (propagated)                     | 1.75    | 9.75   |
| clock uncertainty                                    | -0.20   | 9.55   |
| I_ORCA_TOP/I_BLENDER_3/s4_op2_reg_30_/CP (sdnrq1)    | 0.00    | 9.55 r |
| library setup time                                   | -0.17   | 9.39   |
| data required time                                   |         | 9.39   |
| data required time                                   |         | 9.39   |
| data arrival time                                    |         | -9.14  |
| slack (MET)                                          |         | 0.25   |

### 3) Clock tree slack & violoation

Timing Path Group 'pclk'

| Levels of Logic:          | 0.00  |
|---------------------------|-------|
| Critical Path Length:     | 0.35  |
| Critical Path Slack:      | 7.21  |
| Critical Path Clk Period: | 15.00 |
| Total Negative Slack:     | 0.00  |
| No. of Violating Paths:   | 0.00  |
| Worst Hold Violation:     | 0.00  |
| Total Hold Violation:     | 0.00  |
| No. of Hold Violations:   | 0.00  |

| Timing | Path | Group | 'sdr | clk' |
|--------|------|-------|------|------|
|        |      |       | _    |      |

| Levels of Logic:<br>Critical Path Length:<br>Critical Path Slack:<br>Critical Path Clk Period:<br>Total Negative Slack: | 1.00<br>0.48<br>6.50<br>7.50 |
|-------------------------------------------------------------------------------------------------------------------------|------------------------------|
| No. of Violating Paths:                                                                                                 | 0.00                         |
| Worst Hold Violation:                                                                                                   | 0.00                         |
| Total Hold Violation:                                                                                                   | 0.00                         |
| No. of Hold Violations:                                                                                                 | 0.00                         |

\_\_\_\_\_



# **3. Flow – Routing**

### 1) Route option setting

2) Inial route and optimize

### 3) Vertification – DRCs & LVS

Verify Summary:

Total number of nets = 54173, of which 0 are not extracted Total number of open nets = 0, of which 0 are frozen Total number of excluded ports = 0 ports of 0 unplaced cells connected to 0 nets 0 ports without pins of 0 cells connected to 0 nets 0 ports of 0 cover cells connected to 0 non-pg nets Total number of DRCs = 4 Total number of antenna violations = no antenna rules defined Total number of voltage-area violations = no voltage-areas defined Total number of tie to rail violations = not checked Total number of tie to rail directly violations = not checked

#### set\_delay\_calculation\_options -postroute arnoldi

set\_route\_zrt\_common\_options -post\_detail\_route\_redundant\_via\_insertion medium

set\_route\_zrt\_global\_options -timing\_driven true -crosstalk\_driven true

route\_zrt\_group -all\_clock\_nets -reuse\_existing\_global\_route true

route\_opt -initial\_route\_only

route\_opt -skip\_initial\_route -effort high -xtalk\_reduction -power

icc\_shell> verify\_pg\_nets Cell orca\_setup.err existed already. Delete it ... Using [4 x 4] Fat Wire Table for METAL Using [4 x 4] Fat Wire Table for METAL2 Using [4 x 4] Fat Wire Table for METAL3 Using [4 x 4] Fat Wire Table for METAL4 Using [3 x 3] Fat Wire Table for METAL5 Using [3 x 3] Fat Wire Table for METAL6 Checking [VSSQ]: There are no floating shapes All the pins are connected. No errors are found. Checking [VDDQ]: There are no floating shapes All the pins are connected. No errors are found. Checking [VSSO]: There are no floating shapes All the pins are connected. No errors are found. Checking [VDDO]: There are no floating shapes All the pins are connected. No errors are found. Checking [VSS]: There are no floating shapes ERROR: There are 222 floating pins Checking [VDD]: There are no floating shapes ERROR: There are 162 floating pins Checked 6 nets, 2 have Errors



# 4. Result

- ✓ Minimum clock period are reduced by 58% compared with the conventional
- $\checkmark$  Score is reduced by 58% compared with the conventional
- $\checkmark$  DRC is reduce by 40% compared with the conventional
- $\checkmark$  Area is almost same with the conventional

|                | sys_clk | WNS  | minimum clk | Area      | Score      | Improvement | DRC   |
|----------------|---------|------|-------------|-----------|------------|-------------|-------|
| Conventional   | 8.00    | 1.72 | 9.72        | 629209.18 | 6115913.20 | -           | 10.00 |
| EE753(2017) S1 | 4.00    | 0.78 | 4.78        | 659977.44 | 3154692.16 | 48.42       | 7.00  |
| EE753(2017) S2 | 6.00    | 0.14 | 6.14        | 645134.54 | 3961126.08 | 35.23       | 6.00  |
| mine           | 4.00    | 0.00 | 4.00        | 629135.91 | 2516543.66 | 58.85       | 4.00  |



## 4. Result



